Bootstrapping under constraint for the assessment of group behavior in 

human contact networks 

Nicolas Tremblay, 1 ^ Alain Barrat, 2 - 3 ' 4 Cary Forest, 5 Mark 
Nornberg, 5 Jean- Francois Pinton, 1 and Pierre Borgnat 1 

1 Physics Laboratory, ENS Lyon, Universite de Lyon, CNRS UMR 5672, Lyon, France 
2 Aix Marseille Universite, CNRS, CPT, UMR 7332, 13288 Marseille, France 
3 Universite de Toulon, CNRS, CPT, UMR 7332, 83957 La Garde, France 
4 Data Science Laboratory, Institute for Scientific Interchange (ISI) Foundation, Torino, Italy 
5 University of Wisconsin, Physics Department, Madison, USA 
(Dated: December 17, 2012) 

The increasing availability of time - and space - resolved data describing human activities and 
interactions gives insights into both static and dynamic properties of human behavior. In practice, 
nevertheless, real-world datasets can often be considered as only one realisation of a particular 
event. This highlights a key issue in social network analysis: the statistical significance of estimated 
properties. In this context, we focus here on the assessment of quantitative features of specific subset 
of nodes in empirical networks. We present a resampling method based on bootstrapping groups of 
nodes under constraints within the empirical network. The method enables us to define confidence 
intervals for various Null Hypotheses concerning relevant properties of the subset of nodes under 
consideration, in order to characterize its behavior as "normal" or not. We apply this method to 
a high resolution dataset describing the face-to-face proximity of individuals during two co-located 
scientific conferences. As a case study, we show how to probe whether co-locating the two conferences 
succeeded in bringing together the two corresponding scientific communities. 
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I. INTRODUCTION 

High resolution measurements of face-to-face interactions between individuals in different social 
gatherings - such as scientific conferences, museums, schools, or hospitals - were made possible in 
the recent years by the use of wearable sensors, using bluetooth, wireless or RFID (Radio Frequency 
IDentification) technology. These new data paved the way to many empirical investigations [IH1] 
of human contacts, both from a static (e.g., existence of communities, clustering, heterogeneities 
in the number of contacts...) and dynamic (distribution of the durations of contacts, of the time 
between contacts, or of the lifetime of groups of different sizes...) points of view. 

A major issue regarding the analysis of these datasets is that each represents a single realisation 
of a particular event: in contrast to the study of ensembles of random networks, it is not possible 
to generate multiple realizations of the event. Associating a statistical confidence to any measured 
property of these datasets is thus a challenging issue. In this context, various resampling methods 
have been proposed in the case of networks, in particular with the aim to assess the statistical 
significance of the empirical graph topology, for instance for phylogenetic trees [5] or bayesian- 
induced networks !6 . . Another application concerns the significance of community structures [8] 
in networks. In the present article, we do not address the significance of the empirical graph 
structure, that we consider as a fixed input, and we therefore do not try to generate resampled 
versions of the empirical graph as a whole. We focus instead on the statistical significance of 
features characterizing specific groups of nodes within the graph. 

Two data-driven methods have been widely used in the general case to obtain confidence intervals 
for measurable features: the jackknife and bootstrapping [TU]. Both are based on drawing 
random samples from the unique original data recorded in an observation. Transposing the classical 
bootstrap approach to the case of data represented by graphs is however not straightforward; only 
a few works have considered resampling methods for graphs, in specific situations [TTJ [T^j. In 
this paper, we focus on features of groups of nodes, and we formulate a bootstrap protocol: we 
consider resampled versions of the group of interest within the graph, and compare the studied 
group with its resampled versions. This enables us to define statistical significance by comparing 
the features measured in the real data and the ones measured in the random pseudosamples found 
under adequately chosen constraints. 

Amongst other objectives, the proposed method gives estimates of the deviation of the behavior 
of a given group of nodes from a "normal" behavior (i.e., a Null Hypothesis for a statistical test) 
defined by the constrained random pseudosamples, enabling us to assess whether this given group's 
behavior is normal or anomalous. 

In order to illustrate the possibilities offered by this new resampling method, we apply it to 
the case study of a dataset describing the face-to-face interactions of individuals collected in two 
co-located conferences involving two distinct scientific communities: we will show how our method 
allows us to assess to what extent both communities mix together. Moreover, we test the perfor- 
mance of our method in a well-controlled tunable setting. To this aim, we generalize the Chung-Lu 
model of random graphs |13j to weighted networks, and we show how the bootstrapping method 
is able to assess whether groups of nodes in such networks are normal, anomalous and/or possibly 
rare. 

The paper is structured in the following way. Section [XT] presents the data and some of its 
general properties. We introduce in Section [TTTJ the resampling method and we apply it to the data 
in Section [IV] Finally, the weighted Chung-Lu graphs are introduced as benchmarks in Section |V| 
and the performance of the method is assessed on this model of complex network. We conclude in 
Section ED 



II. PRESENTATION OF THE DATASET OF TWO CO-LOCATED CONFERENCES 

A. Data and pre-processing 

We consider a dataset describing the face-to-face proximity of individuals collected in Salt Lake 
City (SLC) in November 2011 during two co- located scientific conferences lasting five days. These 
conferences were jointly organised by the DPP (Department of Plasma Physics) of the American 
Physical Society and the GEC (Gaseous Electronics Conference) in an attempt to bring together 
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both communities - mainly academic researchers and engineers respectively. The face-to-face prox- 
imity of the participants was measured using the SocioPatterns sensing infrastructure [TJ [14] based 
on unobstrusive active RFID tags that can be embedded in conference badges. Two tags exchange 
radio packets only if the individuals wearing them face each other (the human body acts as a 
shield at this frequency and power) within a distance of 1 to 1.5 meters. The detected proximity 
relations are reported by the tags to RFID readers installed in the environment. At the end of the 
conference, the raw data consists of a log of all the recorded contacts. The log is a sequence of lines 
(i, r, i, j) where t is the time at which reader r received the information that the individuals wearing 
tags i and j were in close face-to-face proximity ("in contact") . Given the operating parameters of 
the tags, proximity of two individuals wearing the RFID badges can be assessed with a probability 
in excess of 99% over an interval of 20 seconds pQ, which is a fine enough time scale to resolve 
human mobility and proximity at social gatherings. We therefore aggregate the raw data over time 
windows of 20 seconds: we partition the five days of data gathering into 20 second periods, and 
we associate to each of these periods t the adjacency matrix A* representing the aggregated graph 
over the 20 seconds: = 1 if and only if vertices i and j have exchanged at least one radio packet 
during the time window t, otherwise = 0. 

Overall, the data define a temporal contact network in which nodes represent individuals, and a 
link between two nodes at time t denotes the fact that the corresponding individuals are in face-to- 
face proximity. The temporal network can moreover be aggregated over the total duration of the 
conference, defining a weighted contact network where each node is an individual and where the 
weight of a link between two individuals gives the cumulated time they have spent in face-to-face 
interaction during the conference. 



B. Distributions of contact durations 



We first compare briefly the gathered data with other datasets collected in similar contexts 
using the same infrastructure. We define a contact between two tags i and j as an unbroken 
subsequence of l's within the sequence {^ j- Its duration is the length of this subsequence. 
Table |II C| presents basic statistics of the present data, together with the ones of data collected 
during the 2009 ACM Hyper Text conference (HT09) [T5] and during a congress of the Societe 
Frangaise d'Hygiene Hospitaliere (SFHH) pQ. Note that the sum of the total number of contacts 
(and the total time of contact) within DPP and within GEC does not exactly account for the 
interactions for the conference taken as a whole (ALL), due to the interactions between DPP 
and GEC. The SLC data contain a relatively small number of contacts, in comparison with the 
other conferences, taking into account the number of participants and the duration: this is due to 
the small sampling rate of the total population of the SLC conferences. The distribution of the 
duration of contacts are however very similar in the three contexts, displaying broad shapes with 
no typical scale, as shown in Fig. [l]a. Other statistical properties of the contact networks, such as 
the distribution of degrees, of the inter-contact times or of the weights of the links, also diplay a 
very similar behavior in these three contexts (see Appendix) . This confirms the robustness of the 
main statistical properties of the networks of face-to-face contacts between individuals observed in 
previous works [21 [15] . 

In the present dataset, we can distinguish three categories of contacts: within DPP, within GEC, 
and between both communities. Figure [l]b shows that even though the number of contacts is much 
larger within DPP than within GEC (see Table II C I, the corresponding duration distributions 
collapse remarkably well upon one another. Hence, we do not observe any difference in the statistical 
behavior of the three categories of contacts. Let us also note that we are not interested here in 
modeling these distributions (for instance by power-law or log-normal functional forms), as the 
method we will use is data-driven. It is however of interest to remark that the broad shape of 
the distributions implies that parametric statistical method would be hard to implement, and that 
data-driven statistical methods are expected to be more adequate. 



C. Distributions of the durations of contacts taking place in different places 



The conference venue is spatially heterogeneous, with in particular three broadly defined areas: 
the GEC Area where the GEC registration and coffee breaks took place; the Poster Hall, where the 
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HTT09 


SFHH 


SLC 








GEC 


DPP 


ALL 


# tags 


113 


418 


39 


281 


320 


sample rate 


75% 


33% 


12% 


16% 


15% 


# days 


2 


2 


5 


# contacts 


9582 


27434 


1189 


21519 


23920 


Tot. time of contact (hours) 


102 


414 


18 


306 


339 



TABLE I. Basic statistics concerning the datasets collected in three different scientific conferences. 
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FIG. 1. a) Comparison of the distribution of the durations of contacts for three different datasets. b) Cu- 
mulative distributions of the durations of contacts within the DPP conference, within the GEC conference 
and between both conferences of the SLC experiment. 



poster sessions of both conferences took place; and the Rest, which includes the DPP registration 
desk, two coffee break areas, and corridors linking different parts of the building. The GEC Area 
was situated 500 meters from the Poster Hall (maps are shown in the Appendix). It therefore 
took time and energy to walk from one area to another, not helping interactions between both 
communities. As the measuring infrastructure allows us to identify the area in which each reported 
contact took place, it is interesting to investigate if differences exist between the three types of 
contacts defined above when the spatial information is taken into account. 

To this aim, we show in Fig. [2] the histograms of contact durations broken down by category 
of contact and area. For the DPP contacts (figure on the left), the distributions measured in the 
various areas have similar shapes, and the differences comes from the overall number of contacts 
measured in each area (as members of the DPP did not go much to the GEC area). On the other 
hand, for the contacts between both communities (figure in the middle) and for the GEC contacts 
(figure on the right), different slopes are observed depending on the area of interest. Broader 
distributions are obtained in the Poster Hall, in particular for the contacts between GEC and 
DPP attendees: the Poster Hall was therefore a more favorable setting for long cross-community 
contacts, as could indeed be expected. 



III. BOOTSTRAPPING AND STATISTICAL TEST FOR COMPLEX NETWORKS 



As discussed in the Introduction, our main objective is to provide statistical confidence on the 
measurements of properties of subsets of nodes in networks. To this aim, a standard way is to 
formulate a Null Hypothesis for the normal behavior of a group, and to perform a statistical test 
to decide whether or not to reject this Null Hypothesis. In this section, we will propose a series of 
statistical tests in the context of weighted networks using a specific resampling method based on 
bootstrapping constrained groups in the network. 

We recall that bootstrapping |10j creates new random pseudosamples by using only one empirical 
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FIG. 2. Cumulative histograms of the durations of contacts in the three different areas within the SLC 
conference. Results for: (left) contacts within the DPP community, (middle) contacts between communities 
and (right) contacts within the GEC community. 



observation of the data. The main advantage of using a bootstrap-inspired technique is that it does 
not require any supplementary information than the network itself: it is data-driven. Moreover, 
unlike other data-driven resampling methods such as the jackknifc, it is possible to adapt the size 
of the drawn samples to the size of what is studied. Consequently, bootstrapping methods remain 
adequate even if the size of the groups that are studied has an impact on the estimated features. 

An important issue when assessing significance of some features of a specific group in a network 
is that neither the nodes nor the links are independent from each other. This is reminiscent to 
the issue of creating correlated bootstraps in block bootstrapping [TBI HZ] ■ We will discuss this 
analogy in Section IV E In order to use bootstrapping, it is required to propose a specific sampling 
method to draw replicates of groups with relevant properties. The method we propose is composed 
of two steps: 1) we decide on a specific scheme to draw groups that correspond to a proposed Null 
Hypothesis by imposing constraints on the extracted groups; 2) we then build a bootstrap set of 
many such groups, by randomly sampling them independently and with replacement, as in classical 
bootstrapping. Combining these two steps, we are then able to propose a bootstrap test to decide 
whether the specific group of interest is compatible with the proposed Null Hypothesis. We detail 
the proposed method in the next paragraphs. 



A. Relevant observable features for groups in complex networks 

Let Q = (V,£) be the graph representation of the studied complex network, with V its set of 
nodes and £ its set of edges. We call X° C V the chosen subset of nodes whose behavior we 
compare to the behavior of "normal" groups, obtained as random bootstrap samples satisfying 
given constraints as explained above. Let us call R C V the remaining nodes of the network that 
are not in X°. 

We quantify X 0, s "behavior" by looking at several observable features that are representative 
of how the group is structured. In the context of social networks, relevant features are the ones 
that quantify whether there are strong contacts inside the group, possibly stronger than with other 
nodes. We choose here to use the following seven observable features (generically referred to as Z 
in the following), in addition to the cardinality M of the group X°: 

• Njcx t ne total number of links of £ between nodes of X°; 

• ATjLj the total number of links of £ between nodes of R°; 

• N XR t ne total number of links of £ connecting the two groups of nodes; 

• the total weight of links of £ between nodes of X°; 

• T RR the total weight of links of £ between nodes of i?°; 

• T XR the total weight of the links connecting the two groups. 

• Q x the modularity computed when partitioning the nodes of Q in two groups X° and R° . 

In our case study, Nxx corresponds to the number of pairs of participants within the group X 
that have interacted at least once during the conference, and Txx corresponds to the total time of 
contact between participants within the group X. We recall that the modularity is defined by [15] : 



6 



Q = Jn ^2ieV,jeV A? ~ ~w ^( Ci > c -?)' wnere A is the weighted adjacency matrix, fcj = X^eV A? 
is the strength of node i, N — ^ J2iev jev IS tne total number of links, and Cj is the label of 
the group of node i (1 or 2 here as there are 2 groups), so that S(ci, Cj) = 1 if nodes z and j are 
in the same group, and otherwise. In this case of a partition in two groups, the modularity is a 
scalar between -0.5 and 0.5 and measures how well the partition separates the network into distinct 
communities (a value close enough to 0.5 denotes two strong communities). 

The chosen observables are not fully independent and one might question why we consider so 
many. One of the most widely used observables regarding the behavior of a group in a network is 
the modularity [18]; however, modularity is neither a sufficient nor a unique way to discriminate 
between different types of behaviors. Two groups may have the same modularity but for very 
different reasons. By adding the six other observables that are admittedly not totally independent 
from the modularity, we accept some level of redundancy in the information we gather in order to 
yield a more complete and discriminative description of groups. 

Depending on the specific issue addressed and of the nature of the complex network at hand, 
other observable features could be considered as relevant to describe the behavior of a group. We are 
here guided by the case study consisting in networks of face-to-face contacts between individuals, 
but we emphasize that the proposed procedure of bootstrap under constraints is directly usable in 
other contexts. 



B. Protocol of bootstrapping for statistical test about a group in a network 

The backbone of the developed method is the following. 

1. First, we formulate a Null Hypothesis regarding the behavior of X° as being "normal" given 
a set of constraints on the group. This Null Hypothesis is converted in a specific set of 
constraints on the type of groups that should be in the bootstrap set. More specifically, these 
constraints define accepted values for the observable features of the bootstrap samples. 

2. Second, we create a bootstrap set of Ng such groups by sampling from the data groups of 
nodes satisfying the constraints of the Null Hypothesis; we use X as a generic notation for the 
bootstrap samples. We recall that bootstrapping corresponds to taking these sample graphs 
with replacement. 

3. For large enough Nb, we estimate the behavior of the graphs in the bootstrap set by estimat- 
ing the mean and the variance of the various features. Estimation of this mean and variance 
for the Null Hypothesis is in essence fully data-driven and it defines a "normal behavior" for 
groups under this particular Null Hypothesis (i.e., under this particular set of constraints). 

4. We compare the behavior of X° to this "normal behavior", so as to decide whether or not 
we can reject the Null Hypothesis. If there is a significant deviation between the observed 
behavior of X° and the statistical behavior of the bootstrap samples, the Hypothesis is 
rejected for X°. In this case, one may compute a divergence measurement that evaluates 
to what extent X° deviates from the bootstrap samples and this measure will quantify the 
level of confidence with which the Null Hypothesis is rejected. To this aim, we will define in 
Section |III C| a suitable divergence d. 

An important technical point is that the sampling method should allow us to draw sets of 
nodes that satisfy the chosen constraints. The simplest constraint is the cardinality constraint: we 
constrain the bootstrap samples' size to match X 0, s. This constraint is trivially achieved: for each 
bootstrap sample, we randomly add nodes to it until its size reaches X s. 

Other Null Hypotheses lead us to impose stronger constraints by requiring one (or more) ob- 
servables to be the same in X as in X°. For example, a stronger constraint is the "same Nxx" 
constraint, in which we impose that each boostrap sample has the same number of nodes and the 
same number of internal links than X°. In order to find bootstrap samples satisfying constraints 
such as this one, we use a simulated annealing algorithm jTHj as follows. 

We start with a random set of nodes X, with the same cardinality as X°, and we define the 
cost C of X as the difference between its number of internal links and the one of X°. We use 
an auxiliary "temperature" T that starts at high values. At each step of the simulated annealing 
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procedure, we keep some of the nodes of X and change the rest (the higher is T, the more nodes 
we attempt to change). If the cost C of the new group is lower than C, we accept the change. If 

instead C > C, we accept the change with probability p oc exp J ■ When the cost does not 

decrease during several attempts, we lower the auxiliary temperature and start the whole process 
again. We stop the algorithm as soon as X satisfies the constraint (as soon as C — 0). We repeat 
this process Ng times to obtain the whole boostrap set corresponding to this constraint. 



C. Normalization of features and choice of the divergence d 

Each observable Z is normalized into a dimensionless quantity z known as the "Z-score": z = 
z ~?" where Z* is the expected value and <j* z the standard deviation of the observable Z in a 
random graph with same weight sequence. To construct these random graphs, let us consider the 
full weight sequence (including the zero weights, corresponding to absent links), and randomly 
re-allocate the weights within the ensemble of possible links (i.e., pairs of nodes). This randomizes 
the degree of the nodes as well as their strengths (the strength is defined as the sum of the weights 
of the links of a node) and the local topological structures, and only preserves the weight sequence. 
This normalization may seem arbitrary, but this mode of representation is chosen for its clarity 
(we can plot all 7 observables on the same figure) and because it removes the effects due to the 
scale of the groups allowing us to compare the results between different groups. Indeed, Z* and a* z 
depend on X°'s cardinality M. For each normalized observable z, we compute the mean z b and 
the standard deviation a h z of the ensemble of bootstrap samples. 

As mentioned above, we also need to define a divergence d quantifying if X a is far from the 
bootstrap set of not. For each observable feature Z, we define a divergence d z as the distance 
between z x , the actual measured value for X°, and the interval [z b — 3 a z ,z b — 3 a z ] (d z = 
if z x is in the interval). This interval would have the meaning of an acceptance interval for the 
Null Hypothesis if the observables were Gaussian. Indeed, more than 99% of the realisations of 
Gaussian random variable lie within 3 standard deviations of the mean [20) . The distance to the 
interval measures thus the deviation of the observed value for X° from the core of the distribution. 

The sum d of the divergences d z corresponding to the various observables is computed and will be 
retained as the global divergence measuring to what extent we have to reject the Null Hypothesis 
for X°: the larger d, the higher our confidence level to reject the Null Hypothesis. If X° is in 
the acceptance interval (at 99%) for every and each observable, then d is simply zero and the Null 
Hypothesis is not rejected. 



D. Final output of the constrained bootstrap method 



The validity of the classical unconstrained bootstrapping relies on an unbiased randomness in 
the choice of the samples. In our case, by imposing constraints on the bootstrap samples, we 
lose some randomness and introduce possible dependencies: while the divergence d is sufficient to 
summarize an unconstrained test's outcome, we need here to also be especially careful to track 
the bias introduced by the constraints. In the following, we propose a practical way to control the 
validity of the procedure. 

We track two indicators to monitor the bias introduced by constraints. The first one is the 
standard deviation o~ u of the distribution of the number of times each node is chosen in a bootstrap 
sample. It measures how uniformly a node is chosen in a bootstrap sample: the smaller is o~ u , the 
more the choice of the nodes for the bootstrap set is uniform. The second indicator measures if 
nodes in X Q are chosen more - or less - often in the bootstrap samples than they would if there 
were no constraints. For that, we compare the empirical distribution of the number of nodes 
from X" that are in a bootstrap sample to the theoretical distribution valid if there were no 
constraint. This theoretical probability distribution of drawing k nodes from X° after M = \X°\ 
draws without replacement in a total set of V = \V\ nodes in the complete network is given by the 

(M\/V-M\ 

hypergeometric law: P{k) = k }v\~ k ■ We then compute the x 2 distance between the empirical 

Km) 

distribution and the theoretical hypergeometric distribution. In order to compare different \ 2 
obtained from different boostrap tests, each x 2 value is computed with 10 bins that contain at 
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least five realisations. An important point is that we do not use x 2 f° r a goodness-of-fit test. We 
indeed expect \ 2 to increase as soon as we impose stronger constraints on the bootstrap samples. 
Rather, we use x 2 an d o u as two control parameters of the "uniform character" of the bootstrapping 
procedure, and check that they stay reasonably small. 

Finally, the output of the proposed test is a triplet (d, x 2 , c«) which summarizes the outcome of 
the test for X°. The larger is d, the higher the confidence level to reject the Null Hypothesis. The 
lower x 2 and a u , the less biased is the choice of the pseudosamples and the more valid is the test. 



IV. CASE STUDY: BOOTSTRAPPING UNDER CONSTRAINTS FOR SPECIFIC 
GROUPS OF ATTENDEES IN THE CONFERENCE 

A. Choosing the groups and the Null Hypotheses 

The original question of interest for the organizers of the SLC conferences is whether co-locating 
both conferences was worthwhile, i.e., whether the GEC and DPP communities mixed together. As 
previously shown, contacts did occur between individuals registered to the GEC and DPP. In order 
to give a quantitative answer to the question, one needs to compare the amount of interactions 
between GEC and DPP to some reference. In other words, are these interactions statistically 
significant? To answer such a question, the proposed bootstrap method is a natural candidate. 

It is first important to note that the seven chosen observables that characterize a group's "behav- 
ior" actually imply observables measured within the group (Nxx and Txx ) , observables measured 
within the rest of the network (Nrr and Trr), and observables measuring the interaction between 
the group and the rest of the network (Nxr, Txr and Qx)- The terminology "group's behav- 
ior" is used for simplicity, but the chosen observables quantify also the behavior of the group's 
complementary as well as the interaction of the group with the rest of the network. Quantifying 
for instance "GEC's behavior" represents therefore a possible measurement of the mixing between 
both communities, as the DPP individuals correspond precisely to the "rest" of the network. The 
method previously exposed is a means not only to quantify, but also to validate statistically, the 
normality - or abnormality - of GEC's behavior with respect to various Null Hypotheses. We thus 
use this method for the group of GEC individuals, taken as the specific subset of interest X° in 
the face-to-face contact network between the attendees of the SLC conference. 

The different Null Hypotheses that define the bootstrap samples X are taken as constraints on 
the amount of interaction involving nodes of X. We consider five different Null Hypotheses, or set 
of constraints: in each case, the Null Hypothesis can be phrased as U X° has a behavior compatible 
with a random group X of nodes satisfying the chosen set of constraints". We consider the following 
series of constraints on each random group: 

• the size of the group is fixed, equal to the one of X°; 

• in addition, the modularity of the partition of the network between the group and its com- 
plement is equal to the one of the partition (X°, V\A°); 

• we will as well consider constraints on Nxx, Txx or Txx + Txr/2, imposing that they take 
the same values as respectively Nxox°i Tx«x° or Tx«x° + Txoro/2, in addition to the size 
constraint. These constraints correspond to ways of imposing a certain number of links or 
a certain amount of interaction within the group, or between the group and the rest of the 
network. 

We moreover consider the possibility that all groups with a community behavior (as given quan- 
titatively by the seven observables) could appear as abnormal. We thus investigate the case of three 
other specific groups of individuals, that a priori could have a community behavior: the Students 
from DPP, i.e. attendants preparing a PhD thesis (STP), the Juniors from DPP, i.e. researchers 
with less than 10 years of professional experience (JUP), and the Seniors from DPP, i.e. researchers 
with more than 10 years of experience (SEP). Table [TT] summarizes the measured observables for 
the GEC, STP, JUP and SEP. These groups can be considered a priori as communities because 
of similarities in age and professional status. We can therefore compare the tests' outputs for 
GEC and for these other groups: if their behavior is similar, it could be argued that the subgroup 
GEC simply behaves as if it were a subcommunity of DPP, and the conclusion could be that the 
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Group 


Cardinality 


Nxx 


N X R 


Nrr 


Txx 


Txr 


Trr 


Qx 


GEC 


39 


101 


120 


1907 


58820 


45740 


947100 


0.100 


STP 


106 


384 


850 


894 


252900 


356220 


442540 


0.145 


JUP 


73 


183 


766 


1179 


97600 


303800 


650260 


0.073 


SEP 


99 


226 


704 


1198 


124280 


310740 


616640 


0.095 



TABLE II. The cardinality and the other seven observables of the four groups under study. The 3 time 
columns are in seconds. 




FIG. 3. Results for X° = GEC for a) test with the same cardinality constraint, b) test with the constraints 
of same cardinality and same modularity (8 — 5%). Left: histogram of the number of occurrences of each 
node in the bootstrap samples and its standard deviation a u . Right: histogram of the number of X°-nodes 
in a bootstrap sample with its \ 2 distance from the theoretical hypergeometric histogram (dotted line). 



co-location of the conference was an efficient way to bring together GEC and DPP. If instead GEC 
is significantly more abnormal than the three other groups, one may doubt the efficiency of the 
co-location. 

Our approach is therefore to test those four groups (i.e., the group noted X° in the method will 
alternatively be GEC, STP, JUP, or SEP) against the same Null Hypotheses. We then compare 
the degree with which the Null Hypotheses are rejected for each group. In order to understand 
if GEC's behavior is peculiar, we consider several Null Hypotheses, i.e., sets of constraints on the 
bootstrap samples, and investigate if they significantly discriminate GEC from the other groups. 

We finally note that, in the following, the aggregated graph is pre-processed by deleting links 
between nodes that correspond to an aggregated contact time of the two corresponding individuals 
smaller than 1 minute over the whole conference. The threshold of 1 minute is chosen because 
smaller contact times can be considered as noise in the measurement, associated to very short 
contacts. We have checked that our results are robust with respect to the filtering threshold: 
similar results are obtained when thresholding at 3 and 5 minutes. 



B. Cardinality constraint 



We first consider the following simple Null Hypothesis: GEC behaves like any random group of 
M = 39 individuals in the conference. For this first Null Hypothesis, the only constraint we impose 
to the bootstrap samples is therefore to have a cardinality equal to M. 

Applying the protocol described in |III B[ we first pick randomly Nb = 1000 bootstraps samples 
of 39 nodes. For e ach sample, we compute the seven associated observables and normalize them as 
proposed in III C For each observable Z, the mean z b and the standard deviation a\ are computed 
from the bootstrap samples: this defines what we call the "normal behavior" of a group under this 
constraint. We then obtain the divergences d z for each feature, and finally the triplet (d, x 2 ,er n ). 

Figure [3|a displays two histograms, that show what the outputs a u and x 2 aim at quantifying. 
On the left hand side, the histogram shows the number of times each node is chosen in the bootstrap 
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FIG. 4. Results of the test with same cardinality constraint. For each group X — GEC, STP, JUP and 
SEP, the scalar d (bottom right hand corner of each figure) is an estimation of the distance between the 
statistical behavior of the bootstrap samples (boxplots) and the real data (big black crosses), \ 2 and a u 
are two control parameters of the "uniformity" of the test - see text. 



set: the standard deviation a u quantifies whether the choice is uniformly random or not. On the 
right hand side, the distribution of the number of nodes of X° (GEC) chosen in each bootstrap 
sample is displayed: the \ 2 value measures the distance between the theoretical hypergeometric 
distribution and the actual one. 

The top left plot of Figure [4] summarizes the output of this test for GEC: it compares for each 
feature the boxplots of the bootstrap samples with the measured behavior of GEC (black crosses) 
and gives the values of d, x 2 an d cr u in the bottom right hand corner of the figure. The indicators 
X 2 and a u are small: the bootstrap set and the test are valid; d is non-null: the Null Hypothesis is 
rejected. The other three plots of Figure [4] show the results for the three other groups (SEP, JUP, 
STP). 

On each boxplot, the central red line is the median, the edges of the box are the 25th and 75th 
percentiles and the whiskers extend to the most extreme data points that are not considered as 
outliers. Points are drawn as outliers (small red crosses) if they are larger than q% + w(<?3 — qi) or 
smaller than qi — w(q3 — <?i), where qi and q$ are the 25th and 75th percentiles, respectively. We 
use the value w ~ 1.5 which corresponds to approximately ±2.7<r and 99.3% coverage if the data 
were normally distributed. 

All 4 groups have a non-null divergence d and small x 2 and a u : the Null Hypothesis is rejected 
for all. In other words, none of these groups of individuals behave similarly to a random group 
of nodes with the same cardinality. These results do not come as a a surprise since, as previously 
mentioned, each of these groups are a priori communities and behave indeed as such: compared 
to the bootstrap samples, they tend to have larger Q x , N X X: N RR , T X x, T RR and smaller N XR , 
Txr- Interestingly, GEC's divergence is clearly larger than the others: this first test, even if 
somehow naive, hints at some difference between GEC and the other groups. 
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C. More elaborate constraints 



In order to better discriminate GEC's behavior from the behavior of other groups, we need to 
consider more refined Null Hypotheses, i.e., stronger constraints on the bootstrap samples. There 
is however a trade-off between the test's aptitude to be discriminative and too strong constraints 
that jeopardize the test's validity. Indeed, we expect y 2 and a u to increase if we impose stronger 
constraints on the bootstrap samples. In the following, we explore cases that lie between extreme 
cases with large \ 2 and o~ u (invalidating the test), and the weak test performed in the previous 
paragraph, in which the only constraint is given by the cardinality. We explore this trade-off, 
making sure that the two control parameters % 2 and a u stay small in order to obtain valid but 
discriminating tests (see Section IV E for a precise description of this trade-off). 

We first consider the following refined Null Hypothesis, that takes into account the high modu- 
larity of X°: X° behaves like any random group of nodes with the same cardinality and the same 
modularity as X° (hence forming a community as strong as X ). In fact, requiring the exact same 
modularity is too strong a constraint so that we relax it to: Q\(l — S) < Qx < Qxi^ + ^) with 
8 the error we tolerate. The value of S tunes the strength of the constraint: the lower is S, the 
stronger is the constraint (see section IV E). In the following, 6 = 5%. The set of bootstrap samples 
is found using simulated annealing, as presented in Section |III B| Figure [3jb shows the two same 
histograms as Figure[3ja, but for the bootstrap samples under this new constraint (for X° = GEC). 
As expected, they show a higher a u and x 2 ; Y e t not so large that the uniform character of the 
bootstrap samples would be questionable. The results for the four studied groups are summarized 
in Figure [5] First, we see that the boxplots are not centered around zero anymore, they indeed 
need to be in accordance with a high modularity (typically: high Nxx, Txx and low Nxr and 
Txr)- Divergences are null for STP and JUP, while the divergence for SEP is almost ten times 
smaller than for GEC: this test shows that GEC's behavior is peculiar with respect to the other 
groups considered. 

Other Null Hypotheses, implying other constraints can be considered: imposing Nxx = ^x°x° > 
imposing Txx = Txox , or 2Txx + Txr = 2Tx»x a + Txoro. These constraints are ways to 
impose the amount of interactions involving nodes of each group, respectively in terms of numbers 
of contacts, of the cumulated duration of contacts inside the group, or of the duration of all contacts 
involving individuals in this group. Results are summarized in Fig. [6] for these three constraints (in 
each case, the cardinality constraint is as well imposed) : the divergence from the bootstrap samples 
is always much larger for GEC than for the other groups. Note that each constraint (except the 
cardinality constraint) is relaxed in the same way as the modularity constraint with S = 5%. 

Even though the modularity constraint is the most successful in discriminating GEC from the 
three other groups, the three other tests show corroborative evidence of GEC's peculiar behavior. 
The outputs of all the different tests are consistent, and they show not only that GEC behaves in a 
peculiar fashion, but also in what ways GEC behaves differently. For instance, under the constraint 
of fixed modularity, the boxplots for GEC show that it has particularly high NxXiNhr.Trr 
while having very low Nxr,Txr and slightly low Txx features as compared to random groups of 
nodes with the same modularity: the precise reasons for the rejection of the Null Hypothesis are 
highlighted. 



D. Different locations 



As discussed in Section |TT] and exhibited by the distribution of the contact durations measured 
in the different areas of the conference, some spatial heterogenity is observed in the data. We take 
advantage of the coarse localization of the contacts to investigate the effect of the location on the 
behavior of the groups. To this aim, we perform separate tests on the data collected in the three 
different areas: the GEC area, the Poster Hall and the Rest. Results are presented in Fig. [7| The 
test clearly differentiates GEC's behavior from the others' in all areas. However, the difference 
in behaviors (and the divergence obtained with the GEC group) is smaller when measured in the 
Poster Hall or in the "Rest" of the conference. This result has a simple interpretation: first, GEC 
mixed significantly more with DPP in the areas that were actually common to both conferences; 
second, the members of GEC who went to the location of the DPP conference mixed indeed with 
DPP. This leads us to a somehow obvious remark: organizing activities in common physical spaces 
favors the mixing between two communities. 
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FIG. 5. Results of the test with the constraints of fixed cardinality and fixed modularity (8 — 5% ) for the 
four groups. 



Null Hypothesis 


GEC 


STP 


JUP 


SEP 


Fixed cardinality constraint 


(38, 10, 10) 


(14, 6, 15) 


(3.1, 12, 14) 


(8.5, 9, 15) 


Fixed cardinality and Nxx constraint 


(68, 1876, 96) 


(14, 236, 98) 


(2, 84, 69) 


(19, 7, 23) 


Fixed cardinality and Txx constraint 


(32, 165, 63) 


(4, 804, 121) 


(0, 9, 53) 


(15, 20, 31) 


Fixed cardinality and Txx + constraint 


(46, 156, 49) 


(12, 237, 70) 


(5, 8, 16) 


(9, 25, 43) 


Fixed cardinality and Qx constraint 


(18, 361, 43) 


(0, 93, 29) 


(0, 13, 25) 


(2.2, 20, 24) 



FIG. 6. Summarized results for various sets of constraints. Each entry of the table represents the triplet 
(d,x 2 ,Cu)- All constraints except the cardinality constraint are relaxed with 6 = 5%. 



E. Trade-off between the constraint(s) strength and the validity of the test 



We focus on the trade-off between the constraints' strength and the validity of the test. Strong 
constraints make x 2 and a u increase, but until now, we did not specify what we mean by "too 
large" for these two indicators (i.e. what we mean by "too strong" for a constraint). We focus on 
this question in the following. 

As mentionned previously, the parameter S enables us to tune the "strength" of a given constraint: 
the lower is S, the stronger the constraint. The trade-off is between a small 5 that ensures that 
the bootstraps keep the wanted correlations but that reduces the space of possible bootstraps 
hence jeopardizing the observables' statistics, and a large S that ensures a large space of possible 
bootstraps but that creates bootstraps with less correlations. There is an analogy between this 
discussion and the one on the optimal size I* of the blocks in block bootstrapping [2TJ [22] ( § is 
analagous to I). With Monte Carlo methods, we could try to estimate the reduction of the space 
of possible bootstraps due to a strong constraint. This is intractable in our case where the space of 
possibilities is huge (there are groups of k nodes in a set of n nodes) and where the reduction of 
space due to a constraint is possibly drastic. We instead use x 2 and a u as two indirect measures of 
this reduction of space. Indeed, in the extreme case where we impose the bootstrap samples to be 
exactly the X° group by using too strong constraints (the size of the space of possible bootstraps 
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Null Hypothesis 


area 


GEC 


STP 


JUP 


SEP 


Same 
cardinality 
constraint 


GEC area 
Poster Hall 
The Rest 


(96, 9, 10) 
(12, 11, 10) 
(9, 6, 10) 


(14, 16, 16) 
(9, 16, 15) 
(11, 8, 15) 


(1, 4, 14) 
(1, 2, 13) 
(5, 10, 13) 


(0, 17, 14) 
(2, 13, 14) 
(9, 13, 15) 


Same cardinality 
and same modularity 
constraint 


GEC area 
Poster Hall 
The Rest 


(80, 1144, 32) 
(11, 32, 19) 
(7, 6, 11) 


(10, 12, 22) 
(2, 25, 24) 
(0, 65, 28) 


(1, 12, 21) 
(0, 11, 18) 
(0, 30, 29) 


(0, 6, 19) 
(1, 5, 15) 
(0, 49, 30) 



FIG. 7. Results for the tests with fixed cardinality constraint and fixed cardinality and modularity con- 
straints for the different locations within the conference. Each entry of the table represents the triplet 
(d, x 2 ,o u ). 



is here reduced to one), a u is larger than 300, and x 2 larger than 10 48 (the expected number of 
bootstrap samples having 39 nodes is P(V) x N ~ 10~ 48 ). In some sense, the simulated annealing 
procedure acts as a biased Monte Carlo estimation of the size of the reduced space of possible 
bootstraps. 

Given a confidence level a for the test to be valid, it would be very valuable to formally obtain 
thresholds x 2 * and cr* under which the space of possible bootstraps is large enough and the test 
thereby considered valid. Unfortunately, even in the easier case of block bootstrapping, automatic 
estimation of the optimal size I* has only been obtained for specific estimators and the general 
question remains open. 

These thresholds x 2 * and <r* in turn give a threshold value 5* that represents the trade-off we 
are looking for. In Fig. [sj we plot the evolution of the results of the test (d, x 2 , <t u ) with respect to 
5 for the four groups. Fig. [8^, is for the same cardinality constraint and the same Nxx constraint 
and Fig. [8Jd for the same cardinality constraint and the same Txx constraint. Naturally, % 2 and 
<j u decrease with 6 as the constraint is gradually relaxed, until it converges towards the triplets 
obtained with the same cardinality constraint. As we do not have an analytical link between 
a confidence level a and the values of the threshold, we are bound to give them from empirical 
observation: we propose x 2 * = 1000 and a* = 80. Fig. [3jb shows two histograms that are considered 



acceptable (x 2 < X 2 * and a u < cr*). Figlljjin the annex shows two histograms that are considered 
unacceptable. S* is defined for a given constraint, a given graph, and a given subgroup X . If we 
want to compare the output for four different X° groups for a given constraint, we need to choose 
the largest 6* out of the four. 

Looking at Fig. |8j we obtain S* — 20% for the Nxx constraint; and S* — 25% for the T X x 
constraint. We do not show the results for the other two constraints, but we estimated 6* = 3% 
for the 2 x Txx + T X r constraint; and 5* = 0.001% for the modularity constraint. 



V. THE CASE OF WEIGHTED RANDOM GRAPHS 



In this section, we test our methodology in a controlled setting. We investigate if the proposed 
test can distinguish between a large random fluctuation of behavior - which is rare but can still 
be considered as "normal" - and a truly abnormal behavior. To address this issue, we apply the 
method to a specific type of network which follows a usual model of complex network, the so-called 
Chung-Lu model [13] : this model produces random networks with a pre-defined distribution of 
degrees. In the following, we first adapt this model to weighted graphs, before presenting our 
results. 



A. Weighted Chung-Lu graphs 

A Chung-Lu graph [13, 123) is a random graph with a given expected degree sequence. Consider 
{ki) i=l v the expected degree sequence and W = \ J2i the expected total number of edges. In 
a Chung-Lu graph, the probability that a given edge (connecting nodes i and j) exists is given by 
min(l, w )• 
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FIG. 8. Evolution of the triplet (d, x 2 jO" u ) with the constraints of (a) fixed cardinality and fixed total 
number of internal links (Nxx) and (b) fixed cardinality and fixed total time of contact Txx, with respect 
to 5 for the four groups. We observe a convergence towards the fixed cardinality constraint results (dashed 
horizontal lines). 



As we are interested here in weighted networks, we first introduce a model of weighted Chung-Lu 
graph: to this aim, we create a non-weighted (or binary) Chung-Lu graph, and allocate a weight 
to each edge. In real networks, weights and topology are often not independent This is the 
case in the networks of face-to-face contacts considered previously, as shown in Fig. [9j that displays 
the average strength of nodes as a function of their degree (we recall that the strength of a node 
is the sum of the weights of its edges). In order to produce random weighted networks exhibiting 
similar correlations, we compute from the real data, for each degree k, the empirical distribution 
of the weights of the links attached to nodes of degree k. We model each of these distributions 
by a power law to obtain an estimated distribution Pk(w) [25]. A weighted Chung-Lu graph is 
thus built in the following way: we start by creating a binary Chung-Lu graph with the same 
expected degree sequence as the data. For each node i (of degree k{) of this Chung-Lu graph, 
we draw weights from the appropriate distribution and randomly allocate them to its links 
whose weight has not yet been specified (if i is linked to a node j that has already been considered, 
then the weight of link i — j has already been chosen by using P^. and it does not need to be 
computed again). In this way, the weight sequence will be similar to the empirical graph's, if not 
exactly the same. We thereby obtain a weighted Chung-Lu graph with the same expected degree 
sequence, the same strength-degree correlation and a similar weight sequence than the empirical 
graph. Figure [9] shows in particular that the strength-degree correlation of such a weighted Chung 
Lu graph is in agreement with the empirical data. Each Chung-Lu graph we generate can be seen 
as a topologically randomised version of the graph of contacts we measured. 



Note that this randomisation is in no way related to the one proposed for the bootstrap samples. 
Hence, it is possible to use these weighted Chung-Lu graphs as a controlled input to test the 
method. 
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FIG. 9. Average strength versus degree of nodes in three different scientific conferences. The squares 
represent the same quantity for a weighted Chung Lu graph generated from the empirical distributions of 
the SLC contact network. 



B. Groups' modularity in weighted Chung-Lu graphs 

Let us consider random groups of nodes, of cardinality M = 39 (like GEC) and study their 



modularity in weighted Chung Lu graphs. Figure 10 a) reports the histogram of the modularity of 



such randomly drawn groups. The histogram was obtained by computing 500 different realisations 
of weighted Chung-Lu graphs and measuring, for each realisation, the modularity of 1000 groups 
of 39 nodes. 

Figure |10b ) shows the histogram of the maximum modularity found in a weighted Chung-Lu 
graph. This second histogram was obtained by looking for a group (of cardinality 39) of maximum 
modularity in 740 different Chung-Lu graphs using simulated annealing. Because the number 
of possible groups of 39 nodes is huge, this histogram is only an estimation of the maximum 
modularity: we stop the simulated annealing search after an arbitrary amount of time without 
new result. Interestingly, this histogram seems to fit a Weibull distribution, the extreme values 
distribution in the case of the existence of an upper bound. The \ 2 goodness-of-fit test fails (with a 
p- value of 10 -5 ) but this could be accounted for by the fact that we only have an estimation of the 
extremal value. In fact, we expect a Weibull distribution for independent identically distributed 
random variables having an upper bound, like modularity. 



C. Proposing a controlled model 



Let us insist on an important distinction that is at the heart of our discussion: the difference 
between rare and abnormal events. For instance, in these Chung-Lu graphs, an overwhelming 
majority of groups have modularity lower than 0.08 (out of half a million random tries, none had 
a modularity higher than 0.08). We can therefore affirm that a group with a modularity of, say, 
0.16 is rare, however it is not necessarily abnormal. If it is not too rare (which translates, in the 
bootstrap approach, to : if er u and x 2 are not too high), then we have enough bootstrap samples 
to compare it with, and we are able to test whether or not it is abnormal with respect to other 
groups with the same modularity. If it is too rare (i.e. if a u and x 2 are too high), then we do not 
have enough statistics and we are unable to conclude. 

For each generated weighted Chung-Lu graph, we choose 10 groups with modularities incre- 
mented from (the most common) to 0.18 (the rarest). We apply the method to each of these 
groups (X° being alternatively each of these 10 groups), and we repeat on a large number of 
Chung-Lu graphs to obtain the average performance of the method. We use four different Null 
Hypotheses for the boostrap tests, corresponding to: i) the fixed cardinality constraint ("Card"), 
ii) the constraints of fixed cardinality and total number of links within the group ("Afj;j("), iii) the 
constraint of fixed cardinality and total time of interaction within the group ("Txx"), and iv) the 
constraint of fixed cardinality and modularity ("Modu"). As the graphs are random, all groups 
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FIG. 10. a) Histogram of the modularity of a sub-graph of 39 nodes in a weighted Chung-Lu graph and 
b) Histogram of the estimated maximal modularity of a group of 39 nodes in a weighted Chung-Lu graph. 
This histogram is fitted with a Weibull distribution. 
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FIG. 11. Median performance of the method with respect to the "rarity" of a group (here quantified by its 
modularity) on the weighted Chung-Lu model. 



(whether common or rare) are somehow "normal", and, in average, should be classified as such by 
the method. 

An important remark is that we choose here to quantify the "rarity" of a group with respect to 
its modularity, and then test how well the tests differentiate rare groups from abnormal ones. This 
choice of definition for the rarity of a group has of course an impact on the performances of the 
tests. We could have chosen to quantify the rarity with respect to other features, such as Nxx 
for instance. Our purpose here is however not to test the methodology on an exhaustive list of 
controlled models, but rather to illustrate a way to test the methodology with a given definition of 
rarity. 



D. Results of bootstrap tests on groups in weighted Chung-Lu networks 



We test the method on 240 weighted random Chung-Lu graphs and show the median of the 
outputs (d, x 2 i a u) in Fig. 
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as a function of the modularity of the studied groups. 



E. Interpretation 



As expected, x 2 and a M increase monotonically with the rarity of the considered groups (the 
higher the modularity, the rarer the groups). Following the discussion of IV E we define a "threshold 
of uniformity" beyond which the test is considered untrustworthy: if % z is larger than \ 2 * — 1000 
and/or <r u larger than er* = 80, the output of the method is that no conclusion can be drawn 
because the considered group is too rare. 
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In this context, the modularity constraint test has a very good performance: it correctly classifies 
the groups as normal for modularities under 0.12 and classifies the groups as too rare beyond. It 
never misclassifies groups as abnormal. Interestingly, even though the observable Txx is not 
obviously correlated to the modularity, the Txx constraint test also performs well: it correctly 
classifies the groups as normal for modularities under 0.08 and classifies the groups as too rare 
beyond; and it never misclassifies groups as abnormal. On the other hand, the cardinality (resp. 
Nxx) constraint test performs poorly. It never classifies the groups as too rare to give an answer, 
and starts to misclassify groups as abnormal from a modularity of 0.02 (resp. 0.04). This does not 
come as a surprise. In fact, we have seen that the concept of "normality" is defined with respect to 
a Null Hypothesis. We see here that "normality" with respect to the modularity is not the same as 
the "normality" with respect to Nxx- More specifically, a rare group (with respect to modularity) 
is detected as abnormal with respect to Nxx- 

The modularity of GEC in the SLC conference dataset is 0.10. We can not exactly compare 
this modularity with the modularities of random groups of weighted Chung Lu graphs because the 
empirical network has a well-defined structure that is different from a random Chung Lu graph. The 
order of magnitude is however relevant and lies in the validity region of the test. The conclusion of 
this section is that the bootstrap approach presented to test normality of groups in the conference 
dataset is appropriate and does not inappropriately classify groups as abnormal. 



VI. CONCLUSION 

We have proposed in this work a generic method to compare the behavior of specific groups of 
nodes within a given weighted complex network. The method is inherently flexible: depending 
on the issue addressed in the data at hand, some observables and Null Hypotheses will be more 
appropriate than others. We show via the construction of a controlled model that our method is 
robust with respect to random fluctuations of behavior and that it is able to distinguish a rare 
behavior from a truly abnormal one. We have shown on a new dataset of time-resolved, face-to- 
face human contacts collected during two co-located conferences, that the smaller conference was 
indeed seen as an abnormal group in a statistically significant way. It has fewer contact numbers 
and interaction durations with people from the other conference, even when accounting for its 
organization as a community of high modularity. Another finding was that the mixing was better 
in spaces that were shared by the two conferences. 

More generally, the proposed method for bootstrapping and statistical test in complex network 
can be used in a larger setting: it can be applied to any type of data that can be modelled by graphs. 
Future work includes applying this method for data collected at various times of the day. Another 
development would be to propose Null Hypotheses that directly involve the dynamic behavior of 
groups and not only their aggregated behavior over time. 
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ANNEX 



A. Map 



Figure 12 shows the map of the conference venue in Salt Lake City. GEC was situated around 
antennas 20 and 21: very far from the rest. 



B. Comparison between three conferences 

The social interactions measured in all three conferences (HTT09, SFHH and SLC) show similar 
distributions. For instance, Figure [T3| shows the distribution of the duration of intercontacts. The 



distribution of degrees is shown in Fig. 14 and the distribution of the weights of the links in Fig. 15 
Finally the property of small world is shown in Fig. |16| 
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FIG. 12. General map of the conference venue with the three different general areas. Each black circle 
corresponds to one of the 25 antennas used to measure the social interactions. The GEC area is isolated: 
it is 500 meters away from the Poster Hall. 
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FIG. 13. Distribution of the duration of intercontact: duration of the time, for a node, between two starts 
of contacts. 
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FIG. 14. Distribution of degrees. 
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FIG. 15. Distribution of weights of the links. 
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FIG. 16. Histogram of the shortest paths' length. It shows the property of small world. 
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FIG. 17. Example of a constraint considered as too strong: the same cardinality constraint and the same 
total number of internal links (Nxx) with 5 = 5% for X° = GEC. 



